Search CORE

13 research outputs found

VIENA2: A Driving Anticipation Dataset

Author: A Pentland
FS Saleh
G Ros
HS Koppula
JFP Kooij
L Wang
SR Richter
X Li
X Wang
Publication venue
Publication date: 29/10/2018
Field of study

Action anticipation is critical in scenarios where one needs to react before the action is finalized. This is, for instance, the case in automated driving, where a car needs to, e.g., avoid hitting pedestrians and respect traffic lights. While solutions have been proposed to tackle subsets of the driving anticipation tasks, by making use of diverse, task-specific sensors, there is no single dataset or framework that addresses them all in a consistent manner. In this paper, we therefore introduce a new, large-scale dataset, called VIENA2, covering 5 generic driving scenarios, with a total of 25 distinct action classes. It contains more than 15K full HD, 5s long videos acquired in various driving conditions, weathers, daytimes and environments, complemented with a common and realistic set of sensor measurements. This amounts to more than 2.25M frames, each annotated with an action label, corresponding to 600 samples per action class. We discuss our data acquisition strategy and the statistics of our dataset, and benchmark state-of-the-art action anticipation techniques, including a new multi-modal LSTM architecture with an effective loss function for action anticipation in driving scenarios.Comment: Accepted in ACCV 201

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Deep Learning and Statistical Models for Time-Critical Pedestrian Behaviour Prediction

Author: B Cheng
D Barber
J Li
JFP Kooij
JFP Kooij
JJ Dabrowski
JJ Dabrowski
JJ Dabrowski
JJ Dabrowski
JJ Dabrowski
K Saleh
N Schneider
Raul Quintero Minguez
S Hochreiter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/02/2020
Field of study

The time it takes for a classifier to make an accurate prediction can be crucial in many behaviour recognition problems. For example, an autonomous vehicle should detect hazardous pedestrian behaviour early enough for it to take appropriate measures. In this context, we compare the switching linear dynamical system (SLDS) and a three-layered bi-directional long short-term memory (LSTM) neural network, which are applied to infer pedestrian behaviour from motion tracks. We show that, though the neural network model achieves an accuracy of 80%, it requires long sequences to achieve this (100 samples or more). The SLDS, has a lower accuracy of 74%, but it achieves this result with short sequences (10 samples). To our knowledge, such a comparison on sequence length has not been considered in the literature before. The results provide a key intuition of the suitability of the models in time-critical problems

arXiv.org e-Print Archive

Crossref

Survey on Vision-based Path Prediction

Author: A Lerner
A Robicquet
CG Keller
D Helbing
D Munoz
D Weinland
E Shelhamer
H Zhu
JANE BROMLEY
JFP Kooij
KM Kitani
L Ballan
Nicolas Schneider
R Benenson
S Huang
S Singh
S Yi
SZ Bokhari
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2018
Field of study

Path prediction is a fundamental task for estimating how pedestrians or vehicles are going to move in a scene. Because path prediction as a task of computer vision uses video as input, various information used for prediction, such as the environment surrounding the target and the internal state of the target, need to be estimated from the video in addition to predicting paths. Many prediction approaches that include understanding the environment and the internal state have been proposed. In this survey, we systematically summarize methods of path prediction that take video as input and and extract features from the video. Moreover, we introduce datasets used to evaluate path prediction methods quantitatively.Comment: DAPI 201

arXiv.org e-Print Archive

Crossref

Surgical Video Motion Magnification with Suppression of Instrument Artefacts

Author: A Amir-Khalili
A Sedrakyan
A Shander
ACL Ngo
D Stoyanov
H-Y Wu
HJ Marcus
J Portilla
JFP Kooij
K Cleary
M Janatka
M Willemet
N Wadhwa
NT Clancy
P Norat
SA Feldman
T Lindeberg
Publication venue
Publication date: 15/09/2020
Field of study

Video motion magnification could directly highlight subsurface blood vessels in endoscopic video in order to prevent inadvertent damage and bleeding. Applying motion filters to the full surgical image is however sensitive to residual motion from the surgical instruments and can impede practical application due to aberration motion artefacts. By storing the temporal filter response from local spatial frequency information for a single cardiovascular cycle prior to tool introduction to the scene, a filter can be used to determine if motion magnification should be active for a spatial region of the surgical image. In this paper, we propose a strategy to reduce aberration due to non-physiological motion for surgical video motion magnification. We present promising results on endoscopic transnasal transsphenoidal pituitary surgery with a quantitative comparison to recent methods using Structural Similarity (SSIM), as well as qualitative analysis by comparing spatio-temporal cross sections of the videos and individual frames.Comment: Early accept to the Internation Conference on Medical Imaging Computing and Computer Assisted Intervention (MICCAI) 2020 Presentation available here: https://www.youtube.com/watch?v=kKI_Ygny76Q Supplementary video available here: https://www.youtube.com/watch?v=8DUkcHI149

arXiv.org e-Print Archive

Crossref

UCL Discovery

Using Phase Instead of Optical Flow for Action Recognition

Author: DJ Fleet
DJ Fleet
G Varol
JFP Kooij
L Wang
N Wadhwa
SL Pintea
T Gautama
WT Freeman
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Currently, the most common motion representation for action recognition is optical flow. Optical flow is based on particle tracking which adheres to a Lagrangian perspective on dynamics. In contrast to the Lagrangian perspective, the Eulerian model of dynamics does not track, but describes local changes. For video, an Eulerian phase-based motion representation, using complex steerable filters, has been successfully employed recently for motion magnification and video frame interpolation. Inspired by these previous works, here, we proposes learning Eulerian motion representations in a deep architecture for action recognition. We learn filters in the complex domain in an end-to-end manner. We design these complex filters to resemble complex Gabor filters, typically employed for phase-information extraction. We propose a phase-information extraction module, based on these complex filters, that can be used in any network architecture for extracting Eulerian representations. We experimentally analyze the added value of Eulerian motion representations, as extracted by our proposed phase extraction module, and compare with existing motion representations based on optical flow, on the UCF101 dataset.Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Pattern Recognition and Bioinformatic

Crossref

TU Delft Repository

International Migration, Integration and Social Cohesion online publications

VIENA(2): A Driving Anticipation Dataset

Author: A Pentland
FS Saleh
G Ros
HS Koppula
JFP Kooij
L Wang
SR Richter
X Li
X Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 08/11/2019
Field of study

Action anticipation is critical in scenarios where one needs to react before the action is finalized. This is, for instance, the case in automated driving, where a car needs to, e.g., avoid hitting pedestrians and respect traffic lights. While solutions have been proposed to tackle subsets of the driving anticipation tasks, by making use of diverse, task-specific sensors, there is no single dataset or framework that addresses them all in a consistent manner. In this paper, we therefore introduce a new, large-scale dataset, called VIENA2, covering 5 generic driving scenarios, with a total of 25 distinct action classes. It contains more than 15K full HD, 5 s long videos acquired in various driving conditions, weathers, daytimes and environments, complemented with a common and realistic set of sensor measurements. This amounts to more than 2.25M frames, each annotated with an action label, corresponding to 600 samples per action class. We discuss our data acquisition strategy and the statistics of our dataset, and benchmark state-of-the-art action anticipation techniques, including a new multi-modal LSTM architecture with an effective loss function for action anticipation in driving scenarios

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Interest region based motion magnification

Author: A Levin
C Liu
HY Wu
J Wang
JFP Kooij
KB Raja
MA Saad
N Wadhwa
N Wadhwa
Publication venue
Publication date: 11/09/2017
Field of study

by Manisha Verma and Shanmuganathan Rama

Crossref

IIT Gandhinagar

An RNN-Based IMM Filter Surrogate

Author: CG Keller
H Blom
JFP Kooij
Julian F. P. Kooij
M Arulampalam
N Schneider
RE Kalman
S Becker
S Becker
S Hochreiter
S Särkkä
Y Bar-Shalom
Publication venue
Publication date
Field of study

The problem of varying dynamics of tracked objects, such as pedestrians, is traditionally tackled with approaches like the Interacting Multiple Model (IMM) filter using a Bayesian formulation. By following the current trend towards using deep neural networks, in this paper an RNN-based IMM filter surrogate is presented. Similar to an IMM filter solution, the presented RNN-based model assigns a probability value to a performed dynamic and, based on them, puts out a multi-modal distribution over future pedestrian trajectories. The evaluation is done on synthetic data, reflecting prototypical pedestrian maneuvers

Crossref

Fraunhofer-ePrints

Making a Case for Learning Motion Representations with Phase

Author: A Davis
DJ Fleet
EP Simoncelli
HY Wu
JFP Kooij
JG Chen
N Wadhwa
S Ji
SL Pintea
T Gautama
WT Freeman
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

RED: A simple but effective Baseline Predictor for the TrajNet Benchmark

Author: A Lerner
A Robicquet
CKI Williams
D Helbing
H Akaike
JFP Kooij
KM Kitani
L Ballan
MB Priestley
NR Draper
P Coscia
P McCullagh
RE Kalman
S Hochreiter
S Huang
Publication venue
Publication date
Field of study

In recent years, there is a shift from modeling the tracking problem based on Bayesian formulation towards using deep neural networks. Towards this end, in this paper the effectiveness of various deep neural networks for predicting future pedestrian paths are evaluated. The analyzed deep networks solely rely, like in the traditional approaches, on observed tracklets without human-human interaction information. The evaluation is done on the publicly available TrajNet benchmark dataset [39], which builds up a repository of considerable and popular datasets for trajectory prediction. We show how a Recurrent-Encoder with a Dense layer stacked on top, referred to as RED-predictor, is able to achieve top-rank at the TrajNet 2018 challenge compared to elaborated models. Further, we investigate failure cases and give explanations for observed phenomena, and give some recommendations for overcoming demonstrated shortcomings

Crossref

Fraunhofer-ePrints